A middlebox is a computer networking device that can transform, inspect, filter, and manipulate Internet traffic — otherwise known as connection tampering — that is deemed as restricted between clients and servers due to reasons such as copyright infringement or Internet censorship.
Although these intentions seek to improve security and performance, there is a need to audit and measure their usage, not only for monitoring opaque restrictions to Internet freedom but also for helping content providers understand and explain reasons for unavailability.
So far, the community’s understanding of connection tampering has been solely driven by active measurements. This involves obtaining access to certain networks through purchasing vantage points or recruiting volunteers and running measurements to Internet content from within these networks to test reachability. However, active measurements are inherently limited by the unavailability of user-driven real-world data, the lack of accessible vantage points in many networks, and the need to identify and update important content to test.
In light of this, my colleagues and I from the University of Michigan, Cloudflare, EPFL, and the University of Maryland developed a completely passive methodology that allows us to detect connection tampering from real-world user data.
- Middleboxes tampering with client traffic exhibit traffic patterns that do not look like typical client traffic to a server.
- Researchers developed a set of 19 tampering signatures, packet sequences that may indicate middlebox tampering.
- Some signatures are predominantly observed only in certain regions (for example, PSH ⟶ RST;RST₀ is only seen in China) while others are observed globally.
How to Detect Connection Tampering Passively?
As shown in Figure 1, a typical browser connection to a web server involves the establishment of a TCP handshake (also called a SYN handshake) and then the exchange of multiple data packets. The first data packet typically contains the TLS Client Hello or GET request for HTTP connections. After the exchange of data is completed, the connection is terminated using a FIN handshake.
However, when middleboxes tamper with traffic, they cause traffic patterns that are different from a typical TCP connection. Middleboxes tamper with traffic by either:
- Injecting packets that are designed to terminate connections, such as TCP Reset (RST) packets (Figure 2), or
- Dropping packets forces both the Client and the Server to close the connection (Figure 3).
The presence of TCP RST packets or packet drops are key indicators of connection tampering. While TCP RSTs and packet loss are extremely common in the Internet, the large-scale incidence of such patterns occurring exactly at the stage where middleboxes act (such as immediately after the TLS client hello) is highly indicative of tampering intent. For example:
- The censorship apparatus in China (also called the Great Firewall) injects multiple RST packets after observing a restricted domain name in the TLS client hello.
- Censorship middleboxes in Iran drop the TLS client hello packet when tampering.
Using insights from prior work in censorship detection and manual investigation of large-scale passive data collected by a large CDN, we developed a set of 19 tampering signatures, packet sequences that may indicate middlebox tampering. These packet sequences detect tampering at various stages of a TCP connection and primarily detect RST injection and packet drop-based tampering.
We note here that these signatures do not only detect tampering: there are specific client behaviors (such as happy eyeballs, scanning, and forceful connection closures) that may cause these patterns to occur, but we hope that our study can reveal large-scale patterns that can be studied further through active measurements. We provide a detailed analysis of each of the signatures and evaluate them in our paper.
Global Analysis of Connection Tampering
We applied our signatures to 0.01% of all incoming connections at Cloudflare, a large CDN provider with more than 275 points of presence and global connectivity. The results shown in Figure 4 are from two weeks of passive data in January 2023.
They show that some signatures are predominantly observed only in certain regions (for example, PSH ⟶ RST;RST₀ is only seen in China) while others are observed globally (for example, PSH;Data ⟶ RST). This indicates that our signatures capture both cases of connection closures common across different countries (that may not be due to tampering). It also reveals specific properties of well-known censorship systems such as the Great Firewall. Our results highlight regions that require deeper focus (for example, Peru and Uzbekistan) while also confirming observations about censorship systems in previous work (China and Iran).
Passive measurements allowed us to also see trends in real-world tampering across time. Figure 5 shows that tampering accounts for a larger percentage of traffic from certain countries at certain times and days, particularly in Russia and Iran.
In terms of the latter, we can also see how tampering increased in Iran following protests in September 2022 (Figure 6).
Increasing the Resolution of Internet Health
We view passive measurements as a powerful complement to active measurement strategies, and together these techniques can provide us with a much more comprehensive picture of connection tampering globally. We encourage service providers and ISPs to adopt our technique to contribute to the community’s knowledge about connection termination. Please reach out!
Learn more via our SIGCOMM Research paper and SIGCOMM Talk.
Ram Sundara Raman is a PhD Candidate at the University of Michigan whose research focuses on measuring large-scale network interference and censorship. The views expressed by the authors of this blog are their own and do not necessarily reflect the views of the Internet Society.
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of the Internet Society.